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MULTIVARIATE STATISTICAL PROCESS MONITORS 



This invention relates to multivariate statistical process monitors. The 
term 'process 1 is used in a broad control theory context to include 
controlled devices, plant and controlled systems generally. 

1 INTRODUCTION 

1 . 1 Background of the Invention 

The detection and diagnosis of abnormal situations in the operation of 
industrial processes is a problem of considerable challenge that is 
attracting wide attention in both academe and industry. [Nimmo, 1995] 
outlined that, on its own, the US based petro-chemical industry could 
save up to $10b per year if abnormal situations could be detected, 
diagnosed and appropriately dealt with. The consequences of not being 
able to detect such issues can range from increased operational costs in 
the running of a process to loss of production because of disastrous 
failure of the entire plant. 

The task of detecting and diagnosing industrial processes, whether 
continuous or batch, is difficult. This is because industrial processes 
often present a large number of process variables, such as temperatures, 
pressures, flow rates, compositions, etc. which are regularly recorded up 
to several thousand times a day [Piovoso, 1991], [Kosanovich, 1992], 
This very large amount of data is difficult to analyse and interpret simply 
by observation. Furthermore, it is also often the case that the process 
variables are highly correlated [MacGregor, 1991] and hence the number 
of degrees of freedom within the process is considerably smaller than the 
number of observed process variables. This makes it difficult for even an 
experienced operator to interpret cause and effect interaction by eye. 
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However, the recorded data has embedded within it the substance for 
revealing the current state of process operation. The difficult issue is to 
extract this substance from the data. 

To address this issue of detection and diagnosis, Multivariate Statistical 
5 Process Control (MSPC) approaches have been successfully employed 
[Kresta, 1991], [MacGregor, 1995], [Kourti, 1995]. The MSPC 
techniques aim to successively reduce the number of variables which are 
required to describe significant variation of the process. The recorded 
data are thereby compressed into a set of fewer variables which are 
10 accordingly more manageable and interpretable. 

One such MSPC approach is Partial Least Squares (PLS) which was 
pioneered by H. Wold in the mid 1960s [Geladi, 1988]. The first 
publications on PLS were presented in 1966 [Wold, 1966a; 1966b]. The 
PLS method identifies a parametric regression matrix based upon 

15 predictor and response matrices that are constructed from reference data 
of the process. The predictor matrix is comprised of the signals of the 
manipulated and measured disturbance or cause variables of the process 
(predictor variables), whilst the response matrix is comprised of the 
controlled or effect variables of the process (response variables). The 

20 PLS algorithm decomposes the predictor and response matrices into rank 
one component matrices. Each component matrix is composed of a vector 
product in which one vector describes the variation (score vector) and the 
other the contribution (loading vector) of the score vector to either the 
predictor or response matrix. The decomposition is an iterative approach 

25 for which a pair of component matrices (one for the predictor and one for 
the response) is calculated at each iteration step. The regression matrix is 
updated at each iteration step as a result of this decomposition. The data 
reduction is achieved by compressing the variation of the predictor and 
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response variables down to the smallest number of score vectors that are 
able to effectively describe process behaviour. The selection of the 
number of component matrices that need to be retained is a trade off 
between maximising the variation explained in the predictor and response 
5 matrices and minimising the number of component matrices. Cross 
Validation [Wold, 1978] is most commonly used to define the number of 
component matrices to be retained, e.g. [MacGregor, 1991, 1995], 
[Morud, 1996]. 

[MacGregor, 1995] and [Wise, 1996] established that the PLS 
10 decomposition of the predictor matrix can be employed for the condition 
monitoring of continuous industrial processes. They also highlight that 
this decomposition is similar to a Principal Component Analysis (PCA) 
of the predictor matrix. PLS decomposition of the predictor matrix 
allows the calculation of two statistics. The first statistic (the T squared 
15 statistic) describes variation of Tthe predictor matrix that is significant for 
predicting the response variables. In contrast, the second statistic (the Q 
statistic) corresponds to variation in the predictor matrix, which is 
insignificant for predicting the response variables. Both statistics may be 
plotted in statistical monitoring-charts with a time base. This approach is 
20 hereinafter referred to as 'approach I f . 

Another approach for exploiting PLS as a condition monitoring tool is 
discussed for instance in [MacGregor, 1991] and [Kresta, 1991]. In this 
approach, several statistical plots are used to detect and diagnose 
abnormal process behaviour. The plots are: 



25 



• x-y plots of the squared prediction error of the response variables 
versus the score values of each score vector representing the predictor 
variables (monitoring charts), 



• plots of each combination of two score vectors representing the 
predictor variables (scatter plots) and 

• plots of the squared prediction error of either the predictor or 
response variables versus time (SPE charts). 

This approach is hereinafter referred to in this description as 'approach 

ir. 

1.2 Summaries of the Invention 

In the following description, an extension to the standard PLS algorithm, 
hereinafter referred to as the 'extended PLS 1 or 'EPLS', is set forth for 
continuous processes. This extension results in the determination of two 
new PLS scores based on the score vectors of the predictor matrix. The 
new PLS score vectors are denoted as generalised score vectors. The first 
generalised score vector describes significant variation of the process 
including the predictor and response variables. The second generalised 
score vector represents the prediction error of the PLS model and 
residuals of the predictor matrix. The EPLS approach gives rise to 
monitoring charts for T squared and Q which are similar to those 
obtained from PCA when both predictor and response variables are 
analysed by PCA. This is distinct from the standard PLS approach which 
only analyses the predictor variables and therefore gives no insight into 
the behaviour of the response variables unless there is feedback in the 
process. 

The advantage of the EPLS monitoring charts is therefore that they 
represent variation of the predictor and response variables together with 
their residuals. This improves the monitoring charts of approach I which 
only describe variation and residuals of the predictor variables. In 
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contrast to approach II, EPLS provides the capability to monitor the 
process on just two charts, rather than the number of charts being 
dependent upon the number of component matrices. 

According to one aspect of the present invention a method of 
5 designing/configuring a multivariate statistical process monitor by a 
partial least squares approach comprises constructing from reference data 
of the process predictor and response matrices, the predictor matrix 
being comprised of signals of the manipulated and measured disturbance 
or cause variables of the process (predictor variables), and the response 

10 matrix being comprised of the controlled or effect variables of the 
process (response variables), decomposing the predictor and response 
matrices into rank one component matrices, each of said component 
matrices being comprised of a vector product in which one vector (the 
score vector) describes the variation and the other (the loading vector) 

15 the contribution of the score vector to the predictor or response matrix, 
decomposition being performed by the creation of a parametric regression 
matrix based upon iterations of the decomposition of the predictor and 
response matrices, characterised by the creation of a first generalised 
score vector which describes any significant variation of the process 

20 including variations of the predictor and response variables, and a second 
generalised score vector which represents the prediction error of the 
partial least squares model and residuals of the predictor matrix. 

Preferably the generalised scores are calculated by constructing an 
augmented matrix, denoted here by Z and of the form 

25 z = [y:x], 
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where X is the predictor matrix and Y is the response matrix, and 
constructing a score matrix T fl = T n * - E* in which T n " and E* are generally 

of the form: 

r = [Y:X]|jl™ ia^R. 

n 

the columns of the matrix T rt * providing the generalised t-scores and the 

columns of the matrix E* the generalised residual scores, where 3 
denotes an MxM identity matrix f . 

b pls is the PLS regression matrix.. 

10 According to a second aspect of the invention we provide a multivariate 
statistical process monitor which has been designed/configured in 
accordance with the first aspect of the invention and which is so arranged 
as to identify abnormal process behaviour by analysing the residuals of 
the response variables. 

15 According to a third aspect of the invention we provide a method of 
monitoring a process which comprises configuring a multivariate 
statistical process monitor by the method of the first aspect of the 
invention, and identifying abnormal process behaviour, at least in part, 
by analysing the residuals of the response variables. 

20 The invention will now be further described, by way of example only, 
with reference to the accompanying Figures which show: 



Figure 1 - Schematic Diagram Of The Fluid Catalytic Cracking 
Unit, 



Figure 2 - Schematic Diagram of one Fluidised Bed Reactor and 
its adjacent Units 

Figures for Fluid Catalytic Cracking Unit without Controller Feedback in 
Predictor Matrix: 

Figure 3 - Statistics Monitoring Charts for Normal Operating Data 
(Upper Charts represent the PLS Monitoring Charts -PLS-T 2 and - 
Q statistic- and Lower Charts show the EPLS Monitoring Charts - 
EPLS-T 2 and -Q statistics-) , 

Figure 4 - Statistical Monitoring Charts for the Unmeasured 
Disturbance (Upper Charts represent the PLS Monitoring Charts - 
PLS-T 2 and -Q statistic- and Lower Charts show the EPLS 

► * * ~ 

Monitoring Charts -EPLS-T 2 and -Q statistic) 

Figure 5 - Error Contribution Chart for Time Instance 11460min. 
The 0 2 and CO Concentration in the Stack Gas Flow have the 
largest Prediction Error 

Figure 6 - Statistical Monitoring Charts for the Change in the 
Regenerated Catalyst Flow into Reactor (Upper Charts represent 
the PLS Monitoring^Charts -PLS-T 2 and -Q statistics- and Lower 
Charts show the EPLS Monitoring Charts -EPLS-T 2 and -Q 
statistic-) 

Figure 7 - Error Contribution Chart for the Change of the 
regenerated Catalyst Flow to Reactor at Time Instance 19357min. 
The Standpipe Catalyst Level and 0 2 Concentration in Stack Gas 
are mostly affected, 
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Figures for Fluid Catalytic Cracking Unit with Controller Feedback in Predictor 
Matrix: 

Figure 8 - Statistical Monitoring-Charts for Unmeasured 
5 Disturbance (Coking Factor); Predictor Variables include the Wet 

Gas Compressor Suction Valve (Upper Charts represent the PLS 
Monitoring Charts -PLS-T 2 and -Q statistic- and Lower Charts 
show the EPLS Monitoring Charts -EPLS-T 2 and -Q statistic-) at 
Time Instance 19357min. 

10 Figure 9 - Statistical Monitoring-Charts for the Change of the 

regenerated Catalyst Flow to the Reactor; Predictor Variables 
include the Wet Gas Compressor Suction Valve (Upper Charts 
represent the PLS Monitoring Charts -PLS-T 2 and -Q statistic- and 
Lower Charts show tge EPLS Monitoring Charts -EPLS-T 2 and -Q 

15 statistic-) 

Figures of the Fluidised Bed Reaction Process: 

Figure 10 - Statistics Monitoring Charts for Normal Operating 
Data (Upper Chart represent the PLS Monitoring Chart -PLS-T 2 
and -Q statistic- and Lower Charts show the EPLS Monitoring 
20 Charts -EPLS-T 2 and -Q statistic-) 

Figure 11 - Statistical Monitoring Charts for the Unmeasured 
Disturbance; (Upper Chart represent the PLS Monitoring Chart - 
PLS-T 2 and -Q statistic-, Lower Charts show the EPLS Monitoring 
Charts -EPLS-T 2 and -Q statistic-) 

25 Figure 12 - EC-Charts for Steam Pressure Upset at Time Instances 

1500min (upper Left plot), ISOlmin (lower Left plot) and 1502min 
(upper right plot) 



Figure 13 - Statistical Monitoring Charts for an abnormal 
behaviour of one of the tubes. (Upper Chart represent the PLS 
Monitoring Chart -PLS-T 2 and -Q statistic- Lower Charts show 
the EPLS Monitoring Charts -EPLS-T 2 and -Q statistic-) 

Figure 14 - EC-Charts for Fluidisation Problem in one of the 
Tubes at Time Instances 436min (upper left chart), 888min (upper 
right chart), 905min (lower left) and 910min (lower right chart) 

To demonstrate the usefulness of the EPLS monitoring charts, and to 
make comparison with approach I, two case study examples will now be 
considered. For each, two typical conditions for abnormal behaviour are 
generated, which describe the impact of an unmeasured disturbance as 
well as an "internal" change of the process behaviour. The example 
processes are the simulation of a fluid catalytic cracking unit (FCCU) 
introduced by [McFarlane, 1993] and a real industrial process that 
produces two different solvents as a result of a complex chemical 
reaction carried out in a fluidised bed reactors. 

The organisation of the following description is as follows. In section 2 
both the standard PLS algorithm and the new EPLS algorithm are 
described and compared. Section 3 introduces the condition monitoring 
statistics associated with these two approaches. Section 4 presents the 
application case studies to give example of the benefits of EPLS. 

2 PARTIAL LEAST SQUARES ALGORITHMS 
2.1 The standard PLS algorithm 

The standard PLS identification technique relies on decomposing the 
predictor matrix, XoeK KxN , and the response matrix, Y 0 €(R KxN to a sum of 
rank one component matrices, [Geladi, 1986]. Both matrices contain K 
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data points, the predictor matrix consists of M variables and the response 
matrix N variables. Both matrices are usually mean centred and 
appropriately scaled prior to the identification procedure. The 
decomposition of both matrices is as follows: 



M M 



Y 0 = £ Y, +E„ = fu.qf +E M = U„ Q T U +E, 



where X, and X are the component matrices of the predictor and 
response matrix, respectively. According to equation (1), the rank one 
matrices can be calculated as a vector product between t; and u, , defined 

as score vectors or latent variables (LVs), and p s and q jf defined as 
loading vectors. M is equal to the number of predictor variables and E M 
represents the prediction error of the process model. Note that if all 
component matrices are included, the predictor matrix is equal to the 
matrix decompositions. If only n component matrices are included then 
equation (1) becomes: 



n n 



x o=Z x -=Z t -pr=T n .p; +Ffl 

r w . . (2) 

Y 0 = £ X +E„ = +E„ = U. +E„ 

in which F n represents the residuals of the predictor matrix. The 
predicted u-scores, u, , can be determined by the following 
multiplication: 

U n =[t,V t n b n ] = T n diag{b n } ) (3) 

where diag{b n ) is a diagonal matrix containing the regression 
coefficients, b if of the score model in successive order. A theoretical 
analysis of the PLS algorithm can be found in the Appendix 1. Different 
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approaches have been introduced to determine the score and loading 
vectors, which are the LSQR algorithm [Manne, 1987], the NIPALS 
algorithm [Geladi, 1986], the SIMPLS algorithm [de Jong, 1993] and 
others . 

5 

Because of the fact that industrial processes often have strongly 
correlated process variables, only a few LVs may be needed to describe 
most of the process variation. In contrast, the remaining pairs of LVs 
basically accommodate noise and insignificant variation in X 0 and Y 0 , 
10 [Geladi, 1986], [MacGregor, 1991] and [Wise, 1996]. To determine the 
number of LVs to be retained, Cross Validation [Wold, 1978] and 
analysis of variance (ANOVA) [Jackson, 1991] have been discussed. 

2.2 Derivation of the EPLS Algorithm and the Generalised Scores 

* 

The EPLS algorithm generates scores which represent the variation of the 
15 predictor and response variables as well as their residuals and they are 
referred to as generalised scores. These scores provide the basis for more 
effective process condition monitoring than the existing approaches, 
which are mainly based on scores that describe variation in the predictor 
variables only. The generalised scores are calculated after the weight and 
20 loading matrices are determined (see Appendix 1) and rely on augmenting 
the response matrix to the predictor matrix. The augmented matrix is 
denoted by Z and is defined as follows: 

Z = [V:X], (4) 

Note that the subscription 0 on both matrices is omitted. This is because 
25 the derivation of the generalised scores relies on the standard PLS 
algorithm and the deflation procedure is not required to be carried out 
again. 
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From equation (4), subtracting the predicted response matrix, Y n , and 
the reconstructed predictor matrix, X n , with n LVs retained, gives rise 
to the following expression: 

[YiX]-[Y„:X n ]=[E n iFj, (5) 

5 where E„ is the prediction error of the response matrix and F n represents 
the residuals of the predictor matrix. By incorporating equations (2) and 
(3), equation (5) can be rewritten as: 

[y: x]-T n [diag{b n }Q T n : p;] = [e„ : f„] . (6) 

As shown in Appendix 2, the matrix product P n r with the PLS regression 

10 matrix, B ( p ^, retaining n LVs, is equal to the matrix product of the 

diagonal matrix diag{b n } and Q^. Integrating this result in equation 6 
yields : 

[y;x]-t„p;[b^!3]=[e„;f„], ' (?) 

where 3 denotes an MxM identity matrix. Carrying out a post- 
15 multiplication of equation (7) by the generalised inverse of [B PLS :3] 
provides: 

[y-xJb« -af - t„p; = [e. ;f.][b« *f , (8) 

where 1 denotes the generalised inverse. As shown in Appendix 3, the 
post-multiplication of equation (8) by R fl (see Appendix 1) leads to a 
20 formula for calculating the scores of the predictor matrix, T n : 

T„ =[YiXj[B« :3f R n -KiFj^is] 1 ^ . (9) 

In equation (9), the score matrix T„ is equal to the difference of two 
matrices. The first matrix relates to the predictor and response matrix 
and the second matrix depends on the prediction error of the response 

25 matrix and the residuals of the predictor matrix. The matrix [E„ : Fj is 
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referred to as the augmented residual matrix to F„. Defining the matrix 
[ b pls ;3 f R n as C&lj, the resultant matrix [Y:X]C^s as T„* and the matrix 
product [E n :F„]C^ s as simplifies equation (9) to: 

* 

T n =T;-E;. do) 

The columns of the matrix T„* are further referred to as the generalised t- 
scores whilst the columns of the matrix E* are denoted as generalised 
residual scores. For process condition monitoring, equation (10) provides 
scores, which describe process variation contained within the absolute 
values of the predictor and the response matrix, as well as the prediction 
error matrix, E„, and the residuals of the predictor matrix, F„. 

The next section describes the derivation of statistics for T # and E* 

n n ' 

which can be plotted versus time in univariate monitoring charts. 

2.3. Comparison with the Approaches I and II to Process Condition 
Monitoring 

As mentioned above, the existing approaches to process condition 
monitoring are mainly based on the t-scores. [MacGregor, 1995] and 
[Wise, 1996] outlined that in approach I, the discarded and the retained t- 
scores form the basis for two monitoring charts which are discussed in 
the next subsection. Although successful applications of approach I have 
been discussed, e.g. [Kourti, 1995], [Wise, 1996] and [Morud, 1996], 
they do not necessarily detect every kind of abnormal process behaviour. 
This is particularly true if: 

1) Abnormal process behaviour affects mainly the response variables that 
are not under closed-loop control. In this case, the abnormal 
behaviour does not propagate through to the predictor variables by 



r 



14 

controller feedback and therefore remains undetected. With EPLS, the 
variation in the response variables will be apparent. 

2) The response variables are highly correlated but the predictor 
variables are not. In this case, only one statistical chart can be 
5 obtained for the standard PLS approach, the T squared chart. With 
EPLS both charts remain relevant, irrespective of the number of LVs 
retained. 

Approach II relies on scatter-plots and x-y charts of the SPE versus 
individual t-scores and SPE charts, e.g. [MacGregor, 1991] and [Kresta, 
1991]. If the process consists of a large number of highly correlated 
process variables, e.g. a hundred or more, the number of required 
scores, however, can be still large for capturing significant process 
variation. In consequence, with these other approaches, a large number 
of charts may be required and* the situation will be cumbersome to 
analyse . In contrast EPLS only requires two charts irrespective of the 
dimension of the problem and the number of LVs selected. 

One could also use the SPE chart of response variables in addition to 
approach I to overcome these above deficiencies. However, this would 
require at most three monitoring charts and the variation of the response 
20 variables is not accumulated in any of these charts. In contrast, the 
generalised scores only require two monitoring-charts and one of the 
generalised scores captures the variation of the response variables. It 
should finally be noted that incorporating the generalised scores for 
process condition monitoring is similar to the way in which PCA is 
25 employed for monitoring industrial processes [Jackson, 1991]. 
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3 STATISTICS OF THE PLS AND EPLS APPROACH 
3.1 Statistics for PLS 

For approach I, two statistical monitoring charts can be obtained based 
on the decomposition of the predictor matrix. The first monitoring chart 
5 is related to the retained t-scores and describes significant contribution 
for the prediction of response matrix. The second chart is associated with 
the variation of the predictor matrix that is captured by the discarded t- 
scores. The discarded t-scores describe insignificant and uncorrected 
contribution towards the prediction of the response matrix. However, in 
10 the case where all predictor variables contribute significantly towards the 
variation of the response variables each t-score has to be retained. 
Hence, there are no t-scores left for computing the second monitoring 
chart. 



The first monitoring chart is based on a statistic, which is denoted as 
15 PLS-T 2 statistic and the second monitoring chart relates to a statistic, 
referred to as PLS-Q statistic. Both statistics are defined as follows: 
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n r t \ 

l ki 



PLS 



\r a ij 
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where PLS T t 2 and PLS Q k represents the PLS-T 2 and -Q statistic. 
Furthermore, t ti denotes the value of the /'* t-score at time instance k and 
T 0 ( the standard deviation of the /'* score vector of the predictor 
variables of the reference data. f u represents the residuals of the V 
predictor variable at time instance k and i a i is the standard deviation of 
the /'* residual variable of the reference data. The notation T 2 and Q have 
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been chosen according to Hotelling's T 2 and Q statistic used in PC A. 
Each statistic can be plotted in a monitoring chart versus time. It should 
be noted that the normalisation of t ki and f w is essential to provide a 
sensitive statistic. If this is not done then the t-scores with a large 
5 variance, usually the first few, dominate the resultant value of the PLS T 2 
and the residuals of the predictor variables that have a large variance 
overshadow residuals with relatively small variance. A fault condition 
that affects primarily the t-scores or residuals, which have small 
variations, may remain undetected in this case. Furthermore, in relation 
10 to the sum of stochastic variables with zero mean and unit variance (Chi- 
Squared Distribution), the statistical estimation of thresholds can be 
used. 

If exceptionally large PLS-T 2 values occur then the overall process 
variation is unusually large compared with the reference data of the 
15 process. This implies that the general process behaviour has considerably 
changed or the process has moved to a new operating region. In contrast, 
unusually large PLS-Q values indicate that the relationships between the 
predictor variables have changed relative to the relationship prevalent 
with the reference data. 

20 3.2 Statistics for EPLS 

Compared to the standard t-scores, the statistical properties of the 
generalised scores are summarised below. 

1) The generalised t-scores as well as the generalised residual scores are 
mean centred if the columns in the predictor and the response matrix 
25 have been mean centred prior to the PLS identification. 
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2) The t-scores of the standard PLS algorithm are orthogonal 

[Hoskuldsson, 1988], In contrast to the standard t-scores, with EPLS 

both score types are not orthogonal irrespective of the number of 
retained LVs. A proof is provided in Appendix 4. 

5 To analyse the generalised scores for process condition monitoring, it is 
desirable to have statistically independent scores, which requires 
orthogonality. In order to achieve orthogonal scores, a singular value 
decomposition (SVD) [Golub, 1996] of the generalised scores can be 
applied, which results in: 



T ; = v n (r) A< r) w„ (r) 

10 " " , (12) 

n n n n 

r 

where V n (T \ A< T \ and } describe the SVD of the generalised t-scores 
~ and V n (E "\A< E \andW n (E ) represent the SVD of the generalised residual 
scores. The dimension of these matrices are as follows, V (T } and V (E) 
are Kxn matrices and A< n A< E *\ and Wf"* are nxn matrices. The 

15 columns of the matrices V rt (T ) , V n (E ) , and Wf ' 3 are orthonormal 

and A* T } A* E } are of diagonal type. Based on the SVD of the generalised 

score matrices of the reference data, the following relationship provides 
orthogonal scores: 



vf>=T;wf>Ai r >VF^ 

" , , (13) 

vf)= E ;wf>Af> VZ^T 



20 in which Vj 7 ] and V n (E ) represent orthogonal T n * and E* scores with unit 
variance, respectively. Including equation (13), the orthogonal scores 
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V n (T } and V^ E * can be directly calculated from the augmented data and 
error matrix, [Y i X] and [E„ : Fj t as: 



VP = [Y:X]C« wr>Af>- VF^T = [Y:X]G« (M) 

vf > = K *„]c<£wf >Af>" VZ^T = [e„ ;f„]h^ ' 

For the generalised score vectors of the i ,h data point, , v< T ) and v< E } , 
the sum of the squared elements may be used to define a univariate 
statistic for each vector. These statistics are denoted as T -T 2 and C .T 2 . 

r t 

T .T 2 and E .T 2 represent the EPLS-T 2 and the EPLS-Q statistic and are 
defined as follows: 



(15) 

Under the assumption that /y v (T ) and /y v (E) are stochastic variables, both 

statistics have a Chi-Squared distribution with n degrees of freedom, 
which provides the confidence limits for testing whether the process 
behaves normally or abnormally. The confidence limits are usually 
selected to include 95% and 99% of the population (EPLS-T 2 or -Q 
values). If a new EPLS-T 2 or Q value is below the limit, the hypothesis 
that the process behaves normally is accepted, otherwise it is rejected and 
the accepted hypothesis is that the process is behaving abnormally. 

Abnormally large EPLS-T 2 and/or -Q values may occur if the relationship 
between the predictor and response variables, represented by the 
parametric regression matrix has changed (e.g. time variant process) or 
the disturbance statistics have changed. Other reasons may be that the 
process is operating at a different operating point, excessive variation of 
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the process has occurred, which was not present in the reference data, or 
abnormal process behaviour has occurred. The hypothesis test is 
therefore a comparison between the current process operation and the 
process operation captured in the reference data. Note that the reference 
5 data describe the process under normal operation and must capture every 
variation that can occur under normal operation otherwise, the statistical 
hypothesis test will be too sensitive. 

[Wise, 1996] emphasised that the T 2 statistics, in particular, may not be 
normally distributed. [Dunia, 1996] analysed the influence of an 

10 Exponential Weighted Moving Average (EWMA) upon the Q statistic 
incorporating PC A. It was found that the Average Run Length (ARL) -as 
the average time passed until an abnormal process behaviour is detected- 
for detecting faulty conditions on sensors could be reduced by invoking 
the EWMA Q-statistic. In this description, an EWMA approach is applied 

15 to the PLS-Q and the EPLS-Q statistic. For these reasons, each 
confidence limit is empirically determined in this description as 
suggested by [Box, 1978]. 

The diagnosis of detected abnormal process behaviour can be carried out 
by analysing the residuals of the response variables. These residuals can 

20 be plotted at each instance in time in a bar chart. A large residual of a 
particular response variable is considered to be affected by the abnormal 
process behaviour and vice versa. Furthermore, [Kourti, 1995] outlined 
that a bar chart can also be produced by the residuals of the predictor 
variables at each instance in time. If the residual of a specific predictor 

25 variable is larger then this variable is considered also to be affected by 
the abnormal process behaviour. The "largeness" of the residuals of a 
particular predictor or response variable is relative to the residuals of 
other predictor and response variables. A comparison of the current 
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residuals has also to be carried out relative to the residuals of the 
reference data. The bar plots are further referred to as the Error- 
Contribution Charts (EC-Charts) -one for the response and one for the 
predictor variables. The bar heights represent thereby the squared 
5 residuals of the response variables and the squared residuals of the 
predictor variables. In order to compare these values with each other 
statistically, a normalisation has to be carried out. If not, a response 
variable that cannot be predicted as well as others, for example, will on 
average cause larger bars relative to the other response variables and vice 
10 versa. 

4. CASE STUDIES 

4.1 Fluid Catalytic Cracking Unit 

A fluid catalytic cracking unit or FCCU is an important economic unit in 
refining operations. It typically receives several different heavy 

15 feedstocks from other refinery units and cracks these streams to produce 
lighter, more valuable components that are eventually blended into 
gasoline and other products. The particular Model IV unit described by 
[McFarlane, 1993] is illustrated in Figure 1. The principal feed to the 
unit is gas oil, but heavier diesel and wash oil streams also contribute to 

20 the total feed stream. Fresh feed is preheated in a heat exchanger and 
furnace and then passed to the riser, where it is mixed with hot, 
regenerated catalyst from the regenerator. Slurry from the main 
fractionator bottoms is also recycled to the riser. The hot catalyst 
provides the heat necessary for the endothermic cracking reactions. The 

25 gaseous cracked products are passed to the main fractionator for 
separation. Wet gas off the top of the main fractionator is elevated to the 
pressure of the lights end plant by the wet gas compressor. Further 
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separation of light components occurs in this light ends separation 
section. 

The selected predictor variables for the FCCU case study are given in 
table 1. 



PREDICTOR VARIABLES 


SIGNAL 


Wash Oil Flowrate 


Constantly Zero at all Time 


Diesel Flowrate 


ARMA Sequence 


Total Fresh Feed 


ARIMA Sequence 


Slurry Flowrate 


ARIMA Sequence 


Preheater Outlet Temperature 


ARIMA Sequence 


Reactor Setpoint 


Constant 


Wet-Gas Compressor Suction Valve 
Position 


Depending upon Reactor 
Pressure 



Table 1: Selected Predictor Variables for FCCU Case Study 



All of these variables belong to the feed section of the unit. To simulate 
realistic disturbance conditions, various different Autoregressive 
Integrated Moving Average (ARIMA) signals were superimposed on 
10 these variables, with the exception of Diesel Flowrate, which received 
only an Autoregressive Moving Average (ARMA) signal as well as the 
Wash Oil Flowrate and the Reactor Setpoint, which were constant or zero 
at all time. 

The response set included Excess Oxygen in the Flue Gas, Concentration 
15 of Carbon Monoxide in the Flue Gas, Riser Temperature, Regenerator 
Bed Temperature, Regenerator Standpipe Level, as well as nine further 
measured variables from the system, see [McFarlane, 1993] for a 
complete list of measured variables for the FCCU system. 
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To test the PLS and EPLS algorithms, the FCCU simulator was 
augmented to include several pre-programmed faults that could be applied 
on command. The first was a step change to the coke formation factor of 
the feed, which simulated a plug of heavier-than-normal feed entering the 
5 unit. The second simulated a disruption in the flow of regenerated 
catalyst between the regenerator and riser, which is typically caused by 
partial or complete plugging of steam injectors located in this line. 

In the first run, no advanced control system was present, only regulatory 
controllers for reactor, air compressor flowrates and the reactor 
10 pressure. With this controller configuration, no feedback between 
response and predictor variables was present for runs 1 and 2 because the 
Wet-Gas Compressor Suction Valve Position was omitted. 

Figure 3 shows the PLS-T 2 and -Q monitoring charts as well as the 
corresponding EPLS charts for a period of approximately 1500 hours of 

15 normal operation. In all figures where T 2 and Q plots are presented, the 
upper solid line represents the 99% confidence limit for the particular 
statistic plotted, while the bottom dotted line represents the 95% 
confidence limit. Furthermore, the ordinate of each T 2 and Q plot is 
logarithmic to the basis of ten. The sampling period was selected to be 

20 30min. 

In Figure 4, the responses of the PLS and EPLS T 2 and Q statistics are 
shown for the first fault, injected at approximately 190.5 hours. Since 
this fault simulates a change in composition of the feed -a plug of 
heavier feed-, its effect is felt immediately in the riser and subsequently 
25 in other parts of the unit that are affected by a change in riser conditions 
after 191 hours. However, there is no direct mechanistic path back to any 
part of the feed system, and therefore none of the predictor variables, as 
defined for runs 1 and 2, are directly effected. Neither are they affected 
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by feedback from the response variables, since no advanced control 
system is present to provide such feedback. Therefore, the PLS-T 2 and - 
Q charts provide no indication at all that the fault has created an 
abnormal condition. 

5 In contrast, the EPLS-T 2 and -Q statistics plotted in Figure 4 clearly 
identify the abnormal condition, at the 99% confidence level. The EC- 
Chart corresponding to the time at which this event is apparent (after 191 
hours) is shown in Figure 5. Variables 12 and 13, Excess Oxygen in the 
flue gas and Concentration of Carbon Monoxide in the flue gas, 

10 respectively, are clearly contributors to the event. This makes physical 
sense, since a plug of heavier feed will cause a rapid increase in the 
amount of coke deposited on the catalyst in the riser and transported to 
the regenerator, having a direct effect on oxygen consumption and 
production of carbon monoxide. The contribution chart does not point 

15 directly to the potential source of the fault, but does provide an 
experienced plant operator with information that would assist in 
narrowing down potential causes. In contrast to the EC-Chart for the 
response variables, the EC-Chart for the predictor variables does not 
show large contribution for any variable. 

20 In the second run, the regenerated catalyst fault was applied after 322 
hours. Again, since the predictor variables all come from the feed section 
of the unit, a fault or disturbance occurring in either the reactor or 
regenerator, or the connecting catalyst lines, will have no mechanistic 
path back to these variables. In this case, the fault only affects response 

25 variables, and conventional PLS-T 2 and -Q charts will not detect the 
event. This is demonstrated in the upper two plots of Figure 6. However, 
the EPLS-T 2 and -Q charts clearly detect the abnormal condition after 
322.5 hours. The corresponding EC-Chart for the response variables, 
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presented in Figure 7, indicates that Excess Oxygen in the flue gas, and 
Standpipe Level in the regenerator have significant contributions. This is 
easily explained since any change in flow of regenerated catalyst will 
affect the material balance in the standpipe, and hence its level. A change 
in regenerated catalyst flow will also affect catalyst-to-feed ratio in the 
riser, resulting in a change in the amount of coke deposited on spent 
catalyst and subsequently the level of oxygen usage in the regenerator. 
Note that the EC-Chart for the predictor variables does not show any 
abnormally large contribution of any variable because of no existing 
controller feedback. 

For runs three and four, the position of the wet gas compressor suction 
valve was included as a predictor variable. Thus, the effect of any 
disturbance or fault that affects reactor pressure will be transferred to the 
predictor variable set through the feedback action of the reactor pressure 
controller. In this case, both PLS and EPLS would be expected to detect 
an abnormal condition arising from this type of fault, and this is 
demonstrated clearly in Figure 8 for the first fault and Figure 9 for the 
second fault. 

The application of approach I on the FCCU case study has shown that the 
PLS monitoring charts can only detect abnormal process behaviour if 
controller feedback is present in the predictor variables. This implies 
further that the EC-Charts for the predictor variables do not show any 
abnormally large contribution to any variable. In contrast, the EPLS 
charts detected both faults. However, in thepresence of controller 
feedback, also the PLS charts are sensitive. 
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4.2 Fluidised Bed Reactor 

This industrial process produces two solvent chemicals, denoted as F and 
G, and consists of several operation units. The core elements of this plant 
are continuous operating units in which the chemical reaction is carried 
5 out. These units are five parallel operating fluidised bed reactors in 
which each produces F and G by complex exothermic chemical reactions. 
These reactors are fed with two different streams of five different 
reactants. Figure 2 shows one reactor and its adjacent units 
schematically. 

10 The first stream is comprised of the reactant A and the second stream of 
the reactants B, C, D and E. A and B are molecules of X 2 type, C is an 
acid, D are molecules that are produced in upstream units and E are plant 
recycles. D and E are vaporised by an upstream vaporiser before entering 
the reactor as part of the seeond stream. Finally, after leaving the 

15 reactors, the separation of F and G is achieved by downstream distillation 
units. 

The reactors consist of a large shell and a number of vertically oriented 
tubes in which the chemical reaction is carried out supported by fluidised 
catalyst. There is a thermocouple at the bottom of each tube to measure 

20 the temperature of the fluidised bed. To remove the heat of the 
exothermic reaction oil circulates around the tubes. The ratio of F:G is 
analysed regularly in a lab. Based on this analysis, the F:G ratio is 
adjusted by the reactor feed-rates. Furthermore, to keep the catalyst 
fluidised at all times the fluidisation velocity is maintained constant by 

25 adjusting reactor pressure relative to the total flow rate. 

The chemical reaction is affected by unmeasured disturbances and 
changes in the fluidisation of the catalyst. The most often observed 
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unmeasured disturbance is caused by pressure upsets of the steam flow 
required by the vaporiser. Unmeasured disturbances may also by caused 
by the coolant (oil), provided by a separate unit. Because of the control 
scheme of the vaporising unit, the pressure upsets of the steam flow 
5 result in a larger or smaller flow rate of the second stream entering the 
reactor. Fluidisation problems appear if the catalyst distribution 
throughout a tube is considerably greater at the bottom of the tube. This 
implies that the chemical reaction is reinforced at the bottom of the tube 
resulting in a significant increase of the tube temperature. 

10 During a period of several weeks, normal operating data as well as data 
containing process abnormalities were obtained for a particular reactor. 
The data set for capturing normal process operation (reference data) had 
to be selected with care. It had to be ensured that the reference data do 
not capture disturbances as described above or fluidisation problems of 

15 one of more tubes. Furthermore, if the size of the reference data were 
too small then normal variation occurring during the chemical reaction 
might not be contained in completion. Each data set describe the process 
in steady state operation. For identifying a steady state PLS model, 
predictor and response variables had to be chosen. The predictor 

20 variables are the flow rate of reactant A, B, D and E, the steam flow to 
the vaporiser and an additional stream required for reducing the pressure 
in the vaporiser. As response variables, the temperature of each tube is 
selected. 

A pre-analysis of the data revealed that the tube temperatures are highly 
25 correlated. Furthermore, correlation also exists between the predictor 
variables. However, the determination of the number of LVs to be 
retained yielded that all six LVs contribute significantly towards to 
prediction of the response matrix. The selection of how many LVs to 
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retain was carried out applying Leave-One-Out cross-validation. This 
case is therefore an example of the second drawback that arises when 
process condition monitoring is carried out by approach L 

Although each score vector of the predictor matrix has to be retained, 
PLS reduces the number of process variables considerably. After the 
identification procedure was completed, the six generalised t-scores and 
the generalised residual scores were computed for the reference data 
according to equation (9). This was then followed by calculating the 
corresponding values of the PLS-T 2 , EPLS-T 2 and -Q statistic and the 
thresholds for the related monitoring charts. Note that the PLS-Q statistic 
cannot be determined. 

Figure 12 shows the monitoring charts for the PLS and the EPLS 
approach for the reference data. Note that the values of each statistic are 
depicted in a logarithmic scale. The graph of the PLS-T 2 and EPLS-T 2 
statistic show natural variation of the process, e.g. due to variations in 
feed. Furthermore, the graph of the EPLS-T 2 statistic shows the impact 
of common variation (unmeasured disturbances), which the model cannot 
describe. 

The first abnormal process behaviour observed represents a large 
unmeasured disturbance because of the drop in steam pressure. The 
resultant monitoring charts are shown in figure 13. Although the steam 
rate remains constant, the enthalpy balance within the vaporiser changes 
and effects the composition of D and E reactants within the second 
stream to the reactor. The unmeasured disturbance occurred after about 
1300min into the recorded data set. The EPLS-Q statistic detects this 
unmeasured disturbance immediately afterwards because the composition 
of the second stream affects clearly the reaction conditions, which the 
PLS model cannot describe. The unmeasured disturbance is not picked up 
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by the PLS-T 2 and EPLS-T 2 statistic because the process variation of this 
event does not exceed the variation in the reference data. The 
corresponding EC -charts for 1500, 1501 and 1502min, see figure 14, 
outlines that for about half of the tubes the temperature cannot be 
5 predicted accurately with respect to the reference data. The unmeasured 
disturbance clearly affects the reaction condition in all of the tubes, 
which could be confirmed by successively investigating the three EC- 
Charts. The diagnosis of this abnormal process behaviour is down to an 
experienced plant operator who could refer the provided information back 
10 to the drop of the steam pressure. 

The second abnormal process behaviour describes a fluidisation problem 
in one of the tubes. There are some manipulations that a plant operator 
can carry but to improve the fluidisation and hence bring the tube 
temperature back to its normal operating value. However, the first 

15 temperature rise passed the plant operator unnoticed. When second 
temperature rise was detected, an attempt was made to bring the 
temperature back to its normal operation. Figure 15 shows the 
corresponding PLS-T 2 and the EPLS monitoring charts. Both EPLS 
statistics detect in both cases that the tube temperature is abnormally 

20 large. In contrast, the PLS T 2 statistic only raises alarm at the 99% 
confidence limit after the second temperature rise. However, also the 
PLS T 2 statistic exceeds at least the 95% confidence limit and indicates 
therefore abnormal process behaviour. The sensitivity of the PLS-T 2 
chart is because of the feedback of the control loop for the fluidisation 

25 velocity as a consequence of the anomalous tube behaviour. The 
manipulation of the process operator can be noticed by the sharp kink in 
each monitoring chart at around 900min in figure 15. After this attempt 
failed, the plant operator shut the tube eventually down. The monitoring- 
charts of the EPLS-T 2 and -Q statistic correspond to the shut down by 
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shooting off. Figure 16 shows the EC-charts after 436min, 888min, 
905min and 910min. According to the EC-charts, the plant operator 
could have started to take action over 450min earlier to maintain the 
operation of that tube. 

5 The application of approach I on this industrial example showed that PLS 
monitoring charts are insensitive (at least in the first example) to detect 
anomalous behaviour of the process. In the second example, however, 
PLS could detect the abnormal tube behaviour (at least at the 95% 
confidence limit). In contrast, the EPLS monitoring charts could clearly 
10 detect each process abnormality (at the 99% confidence limit). 
Furthermore, each LV had to be retained because of its significance for 
predicting the response variables. Consequently, only one monitoring 
chart could be obtained for approach I. 

The second approach (approach II) would lead to 16 scatter plots and 6 
15 monitoring charts and neither of which describes variation of the 
response variables. In contrast, EPLS requires two monitoring charts 
only and the EPLS-T 2 statistic does describe variation of the response 
variables, see equation 9. 

5. CONCLUSIONS 

20 In this description, the conventional PLS approaches for the condition 
monitoring of continuous industrial processes, as described in 
[MacGregor, 1991, 1995], [Kresta, 1991] and [Wise, 1996] are revisited 
and problem areas are highlighted. This analysis reveals that the 
conventional PLS monitoring charts may be either insensitive or difficult 

25 to analyse in the case where the process behaves abnormally. This 
description presents an extension to the standard PLS algorithm, referred 
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to as EPLS, which leads to the definition of new PLS scores, denoted as 
generalised scores. In similar fashion to conventional PLS approaches, 
statistics can be defined based on the generalised scores of EPLS which 
can be plotted versus time on monitoring charts. These monitoring charts 
describe overall variation of the predictor and response variables (EPLS 
T squared chart) and their residuals (EPLS Q chart) . 

A theoretical analysis of the monitoring charts derived from the 
generalised . scores of EPLS and conventional PLS approaches reveals 
that: 

1. According to approach I, if abnormal behaviour affects response 
variables which are not under closed loop control then this situation 
may remain undetected. With EPLS, the abnormal variation of the 
response variables will be apparent. 

2. In the case where the response variables are highly correlated but the 
predictor variables are not, approach I only produces one chart, the T 
squared chart. With EPLS, both charts remain relevant, irrespective 
of the number of latent variables retained. 

3. The second approach (approach II) defines a number of charts 
dependent upon the number of latent variables retained. In contrast, 
EPLS only requires two charts irrespective of the number of latent 
variables selected. 

4. Using the squared prediction error chart in conjunction with the two 
monitoring charts of approach I may overcome the above deficiencies. 
However, this would lead to at most three monitoring charts and the 
variation of the response variables is not present in any of these 
charts. In contrast, EPLS only requires two monitoring charts and the 
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variation of the response variables is not accumulated in the EPLS T 
squared chart. 

This description also presents two application studies to validate the 
theoretically derived results above. The applications relate to the 
5 simulation of a fluid catalytic cracking unit (FCCU) and to a real 
industrial process. Two anomalous situations are present in both case 
studies which describe the impact of an unmeasured disturbance and an 
"internal" change of the process behaviour. 

The results of the FCCU case study clearly demonstrate that controller 
10 feedback is essential for the approach I to provide a robust and sensitive 
conditions monitoring tool. If this is not guaranteed then this approach 
can fail to detect abnormal situations. In contrast, the EPLS approach 
provides a robust and sensitive monitoring charts irrespective of the 
presence of controller feedback. 

This can also be confirmed by the industrial case study. The first 
anomalous situation is not detected by the PLS T squared statistic 
because the upset of the steam pressure does not affect the predictor 
variables severely enough to be detected. However, the condition of the 
chemical reaction within the tubes is affected and hence the behaviour of 
the response variables. The second situation describes an abnormal 
behaviour of one of the tubes. In this situation, controller feedback 
affects the predictor variables as a consequence of the anomalous 
behaviour. 

The industrial case study further illustrates that all latent variables 
25 contribute significantly to the prediction of the response variables. Thus, 
only one monitoring chart (the T squared chart) is available for 
approach I. Furthermore, approach II would lead to the total of 16 scatter 
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plots and 6 monitoring charts. With this number of relevant charts, it is 
accordingly cumbersome to detect abnormal behaviour as distinct from 
observing the two EPLS monitoring charts. 

Further research on the generalised scores of EPLS focuses on 
applications incorporating dynamic process models, e.g. as required for 
model predictive control. It will be investigated whether this dynamic 
process model also provides the basis for process condition monitoring as 
distinct from the steady state analysis introduced so far. Additionally, the 
applicability of the generalised EPLS scores for the monitoring of batch 
processes will be the subject of future consideration. The PLS approach 
for monitoring batch processes, discussed in [Nomikos, 1994; and 1995] 
will thereby provide the basis for the discrimination of a "good" batch 
from a "bad" batch. 



33 



APPENDICES 



A.l Theoretical Analysis of the PLS Algorithm 

The PLS identification algorithm relies on determining each pair of 

component matrices, X k and (see equation 1), by an iterative 

procedure. After the k th iteration step has been carried out, the calculated 
component matrices are subtracted from the predictor and the responds 
matrix, respectively, prior to computing the (k+l) st iteration step. The 
subtraction of the component matrices is also denoted as deflation 
procedure and is as follows: 



Y k =Y^-Y k =Y M -fiw£ 



The score vectors, t k , u k , and loading vectors p k and q k are determined 

to maximise the contribution of each pair of component matrices towards 
the predictor and response matrices. This is achieved by satisfying the 
following criteria: 



t*= x *-. w *; Kll 2 ~ 1 = 0 



«*=Y t . 1 v Jl ; —1 = 0. (A2) 



J„ = maxftfo } = max(w[ • X[_, • Y t _, v 4 } 



u t = +e t =u k +e lr 

( T \ ( T T 2 T l aild ^ A3 ' 

J e = mmfc k e k ]=mm[u k u k -2b k t k u k + b k t k t k ] 
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J p = mm\trace[ [x w - t,p[] r [x w - t A p[ ] }} 
^ = min{frace{ [y,., - u,q[] ^Y,., - u,q[] } 



Solutions to the three cost functions have to be calculated successively. 

Beginning with equation (A2), w k and v k are referred to as the weight 

vectors of the predictor and response matrix, respectively and J wv 

represents the value of corresponding cost function. According to 

[Hoskuldsson, 1988], including the constraints on the length of the 

weight vectors, equation (A2) can also be rewritten as: ' . : 

%• 

Y k _ x X k _ x X k _ x Y k _ x \ k = X k v k 
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Equation (A5) outlines that w k is the eigenvector associated with the 
largest eigenvalue of the cross covariance matrix" X^Y^Y^X^, and v k 
is the eigenvector corresponding to the largest eigenvalue of the cross 
covariance matrix Y[_ x X k _ x Xl_ x Y k _ x . In equation (A3), b k is the regression 

15 coefficient between the k lh pair of score vectors, t k and u k , and J € is the 
value of the cost function. The solution to equation (A3) is the ordinary 
least squares solution for b k . For equation (A4), J p is the related cost 
function for determining p, and J q for computing q t . The cost functions J p 
and/ ? are minimised by the following solution [Geladi, 1986] 
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Y r u 



AT A 



(A6) 
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Finally, including n latent variables, the matrix of regression 
coefficients, B ( p ^ between the predictor and response matrix can be 
calculated as [Lindgren, 1993]: 



Y 0 =X 0 -B^+E n 



r r v. , • ■ < A7 > 

B&=W„[p;W B f diag{bJQ r B 



where W„, P„, and Q„ are matrices storing the n weight vectors, w* and 
loading vectors, p t and q*, as columns. According to equation (A2), the 
weight vector for the predictor matrix, w k , are multiplied with the 
deflated predictor matrix, X k . t to determine the score vector t k . 
[Lindgren, 1993], however, outlined that the score vector t* can also be 
calculated directly from the original predictor matrix, X 0 , as follows: 



n[3-w,-P,i 



(A8) 

T„=X 0 [r, ... rJ=X 0 R„; R„ = wjpjwj' 



A. 2 Proof Required for Equation (7) 

The prediction of the response matrix based on n latent variables retained 
is as follows: 



Y.=T.diag{b.Kfi. (A9) 



The t-scores can directly be calculated from the original predictor matrix, 
see equation A8: 
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T„=X n R n . (A10) 



The PLS regression matrix can therefore be determined as 



B l &=R a diag{b m fa T m . (All) 



Finally, pre-multiplication with P n r provides the required equality (see 
Appendix 3): 

PjB^ = diag{b n )q: . (A12) 
A. 3 Proof Required for Equation (9) 

The definition of the matrix R fl , see equation A8, can be used to prove 
that R^P n = PjR^ = 3^ , 3 nXn denotes an n by n identity matrix. The 
elements of the matrix product are defined as follows: 

Prr>=P/ r [3-w 1 pr]---[3-w y . lP J. l ]iir y ; l<ij<n (A13) 



If / is larger than j 9 equation (A 13) is equal to zero because scalar 
products occur between p i and w ra , l<m<j for which each is equal to 

zero [Hoskuldsson, 1988]. According to equation A. 13, if i is smaller 
than j, the factors of the matrix product can be reduced up to the i lh 
factor, which results in; 
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Pfr ( . = (pf - pfw /P r } • -[3 - w^fc; P >, = *&f± = 1 



15 



(A14) 



Equation (A 14) shows that the transposed vector in the right hand term is 
equal zero and therefore the entire product is equal to zero. Moreover, 
5 the fact that pfw, = 1 also implies that pf^ = 1 . In summary, equations 

(A13) and (A14) layout that the matrix product R^P n is equal to an n by 
n identity matrix. 

A. 4 Orthogonality of both Generalised Scores 

In order to proof that both generalised scores are not orthogonal, the 
10 co variance matrices for both scores, including n LVs, are investigated. 
The covariance matrix of the generalised t-scores is given in equation 
(A15): 

s ( £ = + e; r [t„ + e; ] = s«? + s<;> + sjf + s^. 

s ( r ? -^Rix r K:F w ]c^ =R:s«[3-Rxt« rB ia +3 N. 



S(«) __ u 

k E 

(A15) 



In equation (A15), , S#, S^. , S ( ;j, S ( ;j, B and Sg, represent the 

covariance matrix of the generalised t-scores, the standard t-scores, the 
generalised residual scores, the predictor variables, the prediction error 
of the response variables and the residuals of the predictor variables. 
20 Furthermore, Sill denotes the cross-covariance matrix of the standard t- 

IE 
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scores and the generalised residual scores and is the cross- 

covariance matrix of the prediction error of the response matrix and the 
residuals of the predictor matrix. Note that only the co variance matrix 

is of diagonal type because the standard t-scores are mutually 

5 orthogonal [Hoskuldsson, 1988]. According to equation (A15), it can be 

concluded that the matrices S ( ^? r and S^l are not generally of diagonal 

type, even under the assumption that the columns of the residual 
matrices, E fl and F„, as well as the predictor matrix X consist of white 
noise signals. However, in the theoretical case where the number of 
10 response variables equals the number of manipulated variables and the 
process is decoupled, the covariance matrix S^? r and the cross- 

co variance matrix S^l will be of diagonal type. Consequently, the 

covariance matrix cannot be of diagonal type if n LVs are retained. 
If all M (see equation- 1) LVs are retained then equation (A15) reduces 



15 to: 



C(") _ C(») . D? 
0 . r -0 7T + K w 



Based on the assumption that E a consists of white noise signals the 
covariance matrix S rr is clearly of diagonal type. Because of the pre- 

20 and post-multiplication with generally non-diagonal matrices, however, 
the result is that Sj£ } £ . and therefore S^. are solely of symmetric type. 

The interpretation of the equation (A 15) and (A16) has revealed that 
neither of the covariance matrix S£ } r and S^J. is of diagonal type. 
Hence both generalised score types are generally not orthogonal. 
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Figure 3: Statistics Monitoring Charts for Normal Operating Data (Upper Charts represent the 
PLS Monitoring Charts -PLS-T 2 and -Q statistic- and Lower Charts show the 
EPLS Monitoring Charts -EPLS-T 2 and -Q statistics-) 
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Figure 4: Statistical Monitoring Charts for the Unmeasured Disturbance (Upper Charts represent 
the PLS Monitoring Charts -PLS-T 2 and -Q statistic- and Lower Charts show the 

EPLS Monitoring Charts -EPLS-T 2 and -Q statistic) 



i i — i — ' — — i — i — i — i — i — i ? 




Figure 5: Error Contribution Chart for Time Instance 1 1460min. The O2 and CO 
Concentration in the Stack Gas Flow have the largest Prediction Error 





Figure 6: Statistical Monitoring Charts for the Change in the Regenerated Catalyst Flow into Reactor 
(Upper Charts represent the PLS Monitoring-Charts -PLS-T 2 and -Q statistics- 
and Lower Charts show the EPLS Monitoring Charts -EPLS-T 2 and -Q statistic-) 




Figure 7: Error Contribution Chart for the Change of the regenerated Catalyst Flow to Reactor 

at Time Instance 19357min. The Standpipe Catalyst Level and 
O2 Concentration in Stack Gas are mostly affected 
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Figure 8: Statistical Monitoring-Charts for Unmeasured Disturbance (Coking Factor); Predictor Variables 

include the Wet Gas Compressor Suction Valve (Upper 
Charts represent the PLS Monitoring Charts -PLS-T 2 and -Q statistic- 
and Lower Charts show the EPLS Monitoring Charts -EPLS-T 2 and -Q statistic-) 
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Figure 9: Statistical Monitoring-Charts for the Change of the regenerated Catalyst Flow to the Reactor; 

Predictor Variables include the Wet Gas Compressor Suction 
Valve (Upper Charts represent the PLS Monitoring Chart* -PLS-T 2 
and -Q statistic- and Lower Charts show t e EPLS Monitoring 

Charts -EPLS-T 2 and -Q statistic-) 
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Figure 1 0: Statistics Monitoring Charts for Normal Operating Data (Upper Chart represent the 
PLS Monitoring Chart -PLS-T 2 and -Q statistic- and Lower Charts show the EPLS 

Monitoring Charts -EPLS-T 2 and -Q statistic-) 
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Figure 1 1 : Statistical Monitoring Charts for the Unmeasured Disturbance; (Upper Chart represent 
the PLS Monitoring Chart -PLS-T 2 and -Q statistic- Lower Charts show the 
EPLS Monitoring Charts -EPLS-T 2 and -Q statistic-) 




Figure 12: EC-Charts for Steam Pressure Upset at Time Instances 1500min (upper 
(left plot), ISOlmin (lower Left plot) and 1502min (upper right plot) 
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Figure 13: Statistical Monitoring Charts for an abnormal behaviour of one of the tubes. 
(Upper Chart represent the PLS Monitoring Chart -PLS-T 2 and -Q statistic- 
Lower Charts show the EPLS Monitoring Charts -EPLS-T 2 and -Q statistic-) 
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Figure 14: EC-Charts for Fluidisation Problem in one of the Tubes at Time Instances 
436min (upper left chart), 888min (upper right chart), 905min (lower left) 

and 9 1 Omin (lower right chart) 



